Illumination-Based  Image  Synthesis:  Creating  Novel  Images  of 
Human  Faces  Under  Differing  Pose  and  Lighting* 

Athinodoros  S.  Georghiades  Peter  N.  Belhumeur  David  J.  Kriegman 


Center  for  Computational  Vision  and  Control 
Yale  University 
New  Haven,  CT  06520-8267 


Abstract 

We  present  an  illumination-based  method  for  synthe¬ 
sizing  images  of  an  object  under  novel  viewing  condi¬ 
tions.  Our  method  requires  as  few  as  three  images  of 
the  object  taken  under  variable  illumination,  but  from 
a  fixed  viewpoint.  Unlike  multi-view  based  image  syn¬ 
thesis,  our  method  does  not  require  the  determination 
of  point  or  line  correspondences.  Furthermore,  our 
method  is  able  to  synthesize  not  simply  novel  view¬ 
points,  but  novel  illumination  conditions  as  well.  We 
demonstrate  the  effectiveness  of  our  approach  by  gen¬ 
erating  synthetic  images  of  human  faces. 

1  Introduction 

We  present  an  illumination-based  method  for  creat¬ 
ing  novel  images  of  an  object  under  differing  pose  and 
lighting.  This  method  uses  as  few  as  three  images 
of  the  object  taken  under  variable  lighting  but  fixed 
pose  to  estimate  the  object’s  albedo  and  generate  its 
geometric  structure.  Our  approach  does  not  require 
any  knowledge  about  the  light  source  directions  in  the 
modeling  images,  or  the  establishment  of  point  or  line 
correspondences. 

In  contrast,  nearly  all  approaches  to  view  synthesis 
or  image-based  rendering  take  a  set  of  images  gath¬ 
ered  from  multiple  viewpoints  and  apply  techniques 
akin  to  structure  from  motion  [17,  28,  6],  stereopsis 
[21,  9],  image  transfer  [3],  image  warping  [18,  20,  24], 
or  image  morphing  [7,  23].  Each  of  these  methods 
requires  the  establishment  of  correspondence  between 
image  data  (e.g.  pixels)  across  the  set.  (Unlike  other 
methods,  the  Lumigraph  [12,  19]  exhaustively  sam¬ 
ples  the  ray  space  and  renders  images  of  an  object  from 
novel  viewpoints  by  taking  2  —  D  slices  of  the  4—D  light 
field  at  the  appropriate  directions.)  Since  dense  corre¬ 
spondence  is  difficult  to  obtain,  most  methods  extract 
sparse  image  features  (e.g.  corners,  lines),  and  may 
use  multi-view  geometric  constraints  (e.g.  the  trifocal 
tensor  [2,  1])  or  scene-dependent  geometric  constraints 
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[9,  8]  to  reduce  the  search  process  and  constrain  the  es¬ 
timates.  By  using  a  sequence  of  images  taken  at  nearby 
viewpoints,  incremental  tracking  can  further  simplify 
the  process,  particularly  when  features  are  sparse. 

For  these  approaches  to  be  effective,  there  must  be 
sufficient  texture  or  viewpoint-independent  scene  fea¬ 
tures,  such  as  albedo  discontinuities  or  surface  nor¬ 
mal  discontinuities.  From  sparse  correspondence,  the 
epipolar  geometry  can  be  established  and  stereo  tech¬ 
niques  can  be  used  to  provide  dense  reconstruction. 
Underlying  nearly  all  such  stereo  algorithms  is  a  con¬ 
stant  brightness  assumption  -  that  is,  the  intensity  (ir- 
radiance)  of  corresponding  pixels  should  be  the  same. 
In  turn,  constant  brightness  implies  two  seldom  stated 
assumptions:  (1)  The  scene  is  Lambertian,  and  (2)  the 
lighting  is  static  with  respect  to  the  scene  -  only  the 
viewpoint  is  changing. 

In  the  presented  illumination-based  approach,  we 
also  assume  that  the  surface  is  Lambertian,  although 
this  assumption  is  very  explicit.  As  a  dual  to  the  sec¬ 
ond  point  listed  above,  our  method  requires  that  the 
camera  remains  static  with  respect  to  the  scene  -  only 
the  lighting  is  changing.  As  a  consequence,  geomet¬ 
ric  correspondence  is  trivially  established,  and  so  the 
method  can  be  applied  to  scenes  where  it  is  difficult 
to  establish  multi-viewpoint  correspondence,  namely 
scenes  that  are  highly  textured  (i.e.  where  image  fea¬ 
tures  are  not  sparse)  or  scenes  that  completely  lack 
texture  (i.e.  where  there  are  insufficient  image  fea¬ 
tures). 

At  the  core  of  our  approach  for  generating  novel 
viewpoints  is  a  variant  of  photometric  stereo  [27,  29, 
14,  13,  30]  which  simultaneously  estimates  geometry 
and  albedo  across  the  scene.  However,  the  main  limi¬ 
tation  of  classical  photometric  stereo  is  that  the  light 
source  positions  must  be  accurately  known,  and  this 
necessitates  a  fixed  lighting  rig  as  might  be  possible  in 
an  industrial  setting.  Instead,  the  proposed  method 
does  not  require  knowledge  of  light  source  locations, 
and  so  illumination  could  be  varied  by  simply  waiving 
a  light  around  the  scene. 

In  fact,  our  method  derives  from  work  by  Belhumeur 
and  Kriegman  in  [5]  where  they  showed  that  a  small 
set  of  images  with  unknown  light  source  directions  can 


be  used  to  generate  a  representation  -  the  illumination 
cone  -  which  models  the  complete  set  of  images  of  an 
object  (in  fixed  pose)  under  all  possible  illumination. 
This  method  had  as  its  pre-cursor  the  work  of  Shashua 
[25]  who  showed  that,  in  the  absence  of  shadows,  the 
set  of  images  of  an  object  lies  in  a  3  —  D  subspace  in 
the  image  space.  Generated  images  from  the  illumi¬ 
nation  cone  representation  accurately  depict  shading 
and  attached  shadows  under  extreme  lighting;  in  [11] 
the  cone  representation  was  extended  to  include  cast 
shadows  (shadows  the  object  casts  on  itself)  for  ob¬ 
jects  with  non-convex  shapes.  Unlike  attached  shad¬ 
ows,  cast  shadows  are  global  effects,  and  their  predic¬ 
tion  requires  the  reconstruction  of  the  object’s  surface. 

In  generating  the  geometric  structure,  multi¬ 
viewpoint  methods  typically  estimate  depth  directly 
from  corresponding  image  points  [21,  9].  It  is  well 
known  that  without  sub-pixel  correspondence,  stereop- 
sis  provides  a  modest  number  of  disparities  over  the  ef¬ 
fective  operating  range,  and  so  smoothness  or  regular¬ 
ization  constraints  are  used  to  interpolate  and  provide 
smooth  surfaces.  The  presented  illumination-based 
method  estimates  surface  normals  which  are  then  in¬ 
tegrated  to  generate  a  surface.  As  a  result,  very  subtle 
changes  in  depth  are  recovered  as  demonstrated  in  the 
synthetic  images  in  Figures  4  and  5.  Those  images 
show  also  the  effectiveness  of  our  approach  in  gener¬ 
ating  realistic  images  of  faces  under  novel  pose  and 
illumination  conditions. 

2  Illumination  Modeling 

In  [5],  Belhumeur  and  Kriegman  have  shown  that,  for 
a  convex  object  with  a  Lambertian  reflectance  func¬ 
tion,  the  set  of  all  images  under  an  arbitrary  combina¬ 
tion  of  point  light  sources  forms  a  convex  polyhedral 
cone  in  the  image  space  IRn  which  can  be  constructed 
with  as  few  as  three  images. 

Let  x  G  IRn  denote  an  image  with  n  pixels  of  a 
convex  object  with  a  Lambertian  reflectance  function 
illuminated  by  a  single  point  source  at  infinity.  Let 
B  G  IRnx3  be  a  matrix  where  each  row  in  B  is  the 
product  of  the  albedo  with  the  inward  pointing  unit 
normal  for  a  point  on  the  surface  projecting  to  a  partic¬ 
ular  pixel  in  the  image.  A  point  light  source  at  infinity 
can  be  represented  by  s  G  IR3  signifying  the  product 
of  the  light  source  intensity  with  a  unit  vector  in  the 
direction  of  the  light  source.  A  convex  Lambertian  sur¬ 
face  with  normals  and  albedo  given  by  £?,  illuminated 
by  s,  produces  an  image  x  given  by 

x  =  max(L?s,  0),  (1) 

where  max(L?s,  0)  sets  to  zero  all  negative  components 
of  the  vector  B s.  The  pixels  set  to  zero  correspond  to 
the  surface  points  lying  in  an  attached  shadow.  Con¬ 
vexity  of  the  object’s  shape  is  assumed  at  this  point 
to  avoid  cast  shadows.  It  should  be  noted  that  when 
no  part  of  the  surface  is  shadowed,  x  lies  in  the  3-D 
subspace  C  given  by  the  span  of  the  columns  of  B. 


If  an  object  is  illuminated  by  k  light  sources  at  in¬ 
finity,  then  the  image  is  given  by  the  superposition  of 
the  images  which  would  have  been  produced  by  the 
individual  light  sources,  i.e. , 

k 

x  =  max(£?Sj,  0)  (2) 

2=1 

where  s i  is  a  single  light  source.  Due  to  the  inherent 
superposition,  it  follows  that  the  set  of  all  possible  im¬ 
ages  C  of  a  convex  Lambertian  surface  created  by  vary¬ 
ing  the  direction  and  strength  of  an  arbitrary  number 
of  point  light  sources  at  infinity  is  a  convex  cone.  It  is 
also  evident  from  Equation  2  that  this  convex  cone  is 
completely  described  by  matrix  B. 

This  suggests  a  way  to  construct  the  illumination 
model  for  an  individual:  gather  three  or  more  im¬ 
ages  of  the  face  without  shadowing  illuminated  by  a 
single  light  source  at  unknown  locations  but  viewed 
under  fixed  pose,  and  use  them  to  estimate  the  three- 
dimensional  illumination  subspace  C.  This  can  be  done 
by  first  normalizing  the  images  to  unit  length  and  then 
estimating  the  best  three-dimensional  orthogonal  basis 
B*  using  a  least-squares  minimization  technique  such 
as  singular  value  decomposition  (SVD).  Note  that  the 
basis  B*  differs  from  B  by  an  unknown  linear  transfor¬ 
mation,  i.e.,  B  =  B*A  where  A  G  GL( 3)  [10,  13,  22]; 
for  any  light  source  s,  x  =  B s  =  (B* A)(A~1  s).  Nev¬ 
ertheless,  both  B *  and  B  define  the  same  illumination 
cone  and  represent  valid  illumination  models. 

Unfortunately,  using  SVD  in  the  above  procedure 
leads  to  an  inaccurate  estimate  of  B* .  For  even  a 
convex  object  whose  Gaussian  image  covers  the  Gauss 
sphere,  there  is  only  one  light  source  direction  (the 
viewing  direction)  for  which  no  point  on  the  surface  is 
in  shadow.  For  any  other  light  source  direction,  shad¬ 
ows  will  be  present.  If  the  object  is  non-convex,  such 
as  a  face,  then  shadowing  in  the  modeling  images  is 
likely  to  be  more  pronounced.  When  SVD  is  used  to 
find  B *  from  images  with  shadows,  these  systematic 
errors  bias  its  estimate  significantly.  Therefore,  an  al¬ 
ternative  way  is  needed  to  find  B *  that  takes  into  ac¬ 
count  the  fact  that  some  data  values  should  not  be 
used  in  the  estimation. 

We  have  implemented  a  variation  of  [26]  (see  also 
[28,  16])  that  finds  a  basis  B*  for  the  3-D  linear  sub¬ 
space  C  from  image  data  with  missing  elements.  To 
begin,  define  the  data  matrix  for  c  images  of  an  indi¬ 
vidual  to  be  V  =  [xi  . . .  xc].  If  there  were  no  shad¬ 
owing,  V  would  be  rank  3  (assuming  no  image  noise), 
and  we  could  use  SVD  to  factorize  X  into  X  =  B*S* 
where  S*  is  a  3  x  c  matrix  the  columns  of  which  are  the 
light  source  directions  scaled  by  the  light  intensities  s* 
for  all  c  images. 

Since  the  images  have  shadows  (both  cast  and  at¬ 
tached),  and  possibly  saturations,  we  first  have  to  de¬ 
termine  which  data  values  are  invalid.  Unlike  satura¬ 
tions  which  can  be  trivially  determined,  finding  shad¬ 
ows  is  more  involved.  In  our  implementation,  a  pixel  is 


assigned  to  be  in  shadow  if  its  value  divided  by  its  cor¬ 
responding  albedo  is  below  a  threshold.  As  an  initial 
estimate  of  the  albedo,  we  use  the  average  of  the  mod¬ 
eling  (or  training)  images.  A  conservative  threshold 
is  then  chosen  to  determine  shadows  making  it  almost 
certain  no  invalid  data  is  included  in  the  estimation 
process,  at  the  small  expense  of  throwing  away  some 
valid  data.  After  finding  the  invalid  data,  the  following 
estimation  method  is  used:  without  doing  any  row  or 
column  permutations  sift  out  all  the  full  rows  (with  no 
invalid  data)  of  matrix  X  to  form  a  full  sub-matrix  X. 
Note  that  the  number  of  pixels  in  an  image  (i.e.  the 
number  of  rows  of  X)  is  much  larger  than  the  number 
of  images  (i.e.  the  number  of  columns  of  X),  which 
means  we  can  always  find  a  large  number  of  full  rows 
so  that  the  number  of  rows  of  X  is  larger  than  its 
number  of  columns.  Therefore,  perform  SVD  on  X 
to  get  a  fairly  good  initial  estimate  of  S*.  Fix  S* 
and  estimate  each  of  the  rows  of  B*  independently  us¬ 
ing  least  squares.  Then,  fix  B *  and  update  each  of 
the  light  source  direction  s i  independently,  again  us¬ 
ing  least  squares.  Repeat  these  last  two  steps  until 
estimates  converge.  In  our  experiments,  the  algorithm 
is  very  well  behaved,  converging  to  the  global  mini¬ 
mum  within  10-15  iterations.  Though  it  is  possible  to 
converge  to  a  local  minimum,  we  never  observed  this 
either  in  simulation  or  in  practice. 

Figure  1  demonstrates  the  process  for  constructing 
the  illumination  model.  Figure  l.a  shows  six  of  the 
original  single  light  source  images  of  a  face  used  in  the 
estimation  of  B* .  Note  that  the  light  source  in  each 
image  moves  only  by  a  small  amount  (±15°  in  either 
direction)  about  the  viewing  axis.  Despite  this,  the 
images  do  exhibit  shadowing,  e.g.  left  and  right  of 
the  nose.  In  fact,  there  is  a  tradeoff  in  the  image  ac¬ 
quisition  process:  the  smaller  the  motion  of  the  light 
source,  meaning  fewer  shadows  present  in  the  images, 
the  worse  the  conditioning  of  the  estimation  problem. 
If,  on  the  other  hand,  the  light  source  moves  exces¬ 
sively,  despite  the  improvement  in  the  conditioning, 
more  extensive  shadowing  can  increase  the  possibility 
of  having  too  few  (less  than  three)  valid  measurements 
with  a  fixed  number  of  images  for  some  parts  of  the 
face.  Therefore,  the  light  source  should  move  in  mod¬ 
eration  as  in  the  images  shown  in  Figure  l.a. 

Figure  l.b  shows  the  basis  images  of  the  estimated 
matrix  B* .  These  basis  images  encode  not  only  the 
albedo  (reflectance)  of  the  face  but  also  its  surface  nor¬ 
mal  field.  They  can  be  used  to  construct  images  of 
the  face  under  arbitrary  and  quite  extreme  illumina¬ 
tion  conditions.  However,  the  image  formation  model 
in  Equation  1  does  not  account  for  cast  shadows  of 
non-convex  objects  such  as  faces.  In  order  to  deter¬ 
mine  which  parts  of  the  image  are  in  cast  shadows, 
given  a  light  source  direction,  we  need  to  reconstruct 
the  surface  of  the  face  (see  next  section)  and  then  use 
ray-tracing  techniques. 


b. 

Figure  1:  a)  Six  of  the  original  single  light  source  im¬ 
ages  used  to  estimate  B* .  Note  that  the  light  source 
in  each  image  moves  only  by  a  small  amount  (±15°  in 
either  direction)  about  the  viewing  axis.  Despite  this, 
the  images  do  exhibit  shadowing,  b)  The  basis  images 
of  B\ 

3  Surface  Reconstruction 

In  this  section,  we  demonstrate  how  we  can  generate 
an  object’s  surface  from  £?*  after  enforcing  the  inte- 
gr ability  constraint  on  the  surface  normal  field.  It  has 
been  shown  [4,  31]  that  from  multiple  images,  in  which 
the  light  source  directions  are  unknown,  one  can  only 
recover  a  Lambertian  surface  up  to  a  three-parameter 
family  given  by  the  generalized  bas-relief  (GBR)  trans¬ 
formation.  This  family  scales  the  relief  (flattens  or  ex¬ 
trudes)  and  introduces  an  additive  plane.  It  has  also 
been  shown  that  the  family  of  GBR  transformations  is 
the  only  one  that  preserves  integrability. 

3.1  Enforcing  Integrability 

The  vector  field  B *  estimated  in  Section  2  may  not 
be  integrable,  i.e.,  it  may  not  correspond  to  a  smooth 
surface.  So,  prior  to  reconstructing  the  surface  up  to 
GBR,  the  integrability  constraint  must  be  enforced  on 
B * .  Since  no  method  has  been  developed  to  enforce  the 
integrability  during  the  estimation  of  B* ,  we  enforce 
it  afterwards.  That  is,  given  B*  estimate  a  matrix 
A  G  GL( 3)  such  that  B  =  B* A  corresponds  to  an 
integrable  normal  field;  the  development  follows  [31]. 

Consider  a  continuous  surface  defined  as  the  graph 
of  z(x,y ),  and  let  b (x,y)  be  the  corresponding  nor- 


mal  field  scaled  by  an  albedo  field.  The  integrability 
constraint  for  a  surface  is  zxy  =  zyx  where  subscripts 
denote  partial  derivatives.  In  turn,  b (x,y)  must  sat- 
isfy: 


To  estimate  A  such  that  b T(x,y)  =  h*T (x,y)A,  we 
expand  this  out.  Letting  the  columns  of  A  be  denoted 
by  A3,A2,A3  yields 

(h*T  A3)(bf  A2)  -  (b*TA2)(bf  A3)  = 
(b^AaKbfAO  -  (b*T 

which  can  be  expressed  as 

b*Tf?!b:  =  b*TS2b;  (3) 

where  Si  =  A^A^  —  A2Aj  and  S2  =  A^Aj  —  AiAj. 

Si  and  S2  are  skew-symmetric  matrices  and  have 
three  degrees  of  freedom.  Equation  3  is  linear  in  the 
six  elements  of  Si  and  S2.  From  the  estimate  of  B* 
discrete  approximations  of  the  partial  derivatives  (b* 
and  b*)  are  computed,  and  then  SVD  is  used  to  solve 
for  the  six  elements  of  Si  and  S2.  In  [31],  it  was  shown 
that  the  elements  of  Si  and  S2  are  cofactors  of  A ,  and  a 
simple  method  for  computing  A  from  the  cofactors  was 
presented.  This  procedure  only  determines  six  degrees 
of  freedom  of  A.  The  other  three  correspond  to  the 
GBR  transformation  [4]  and  can  be  chosen  arbitrarily 
because  a  GBR  transformation  preserves  integrability. 
The  surface  corresponding  to  B  =  B*  A  differs  from  the 
true  surface  by  GBR,  i.e. ,  z(x,  y)  =  A z(x,  y)  +  fix  +  vy 
for  arbitrary  A,  /i,  v  with  A  7^  0. 

3.2  Generating  a  GBR  surface 

After  enforcing  integrability,  we  can  now  reconstruct 
the  corresponding  surface  z(x,y).  Note  that  z(x,y)  is 
not  a  Euclidean  reconstruction  of  the  face,  but  a  rep¬ 
resentative  element  of  the  orbit  under  a  GBR  transfor¬ 
mation.  Despite  this,  both  the  shading  and  the  shad¬ 
owing  will  be  correct  for  images  synthesized  from  such 
a  surface  [4]. 

To  find  z(x,y),  we  use  the  variational  approach  pre¬ 
sented  in  [15].  A  surface  z(x,y)  is  fit  to  the  given 
components  of  the  gradient  p  and  q  by  minimizing  the 
functional 

/  (2x  ~  P )2  +  (zy  ~  Q )2  dx  dy. 

Jn 

the  Euler  equation  of  which  reduces  to  V2  z  =  px  +  qy. 
By  enforcing  the  right  natural  boundary  conditions 
and  employing  an  iterative  scheme  that  uses  a  discrete 
approximation  of  the  Laplacian,  we  can  reconstruct 
the  surface  z(x,  y)  [15]. 

Recall  that  a  GBR  transformation  scales  the  re¬ 
lief  (flattens  or  extrudes)  and  introduces  an  additive 


plane.  To  resolve  this  GBR  ambiguity,  we  take  ad¬ 
vantage  of  the  fact  that  we  are  dealing  with  human 
faces  which  constitute  a  well  known  class  of  objects. 
We  can  therefore  exploit  the  left-to-right  symmetry  of 
faces  and  the  fairly  constant  ratios  of  distances  be¬ 
tween  facial  features  such  as  the  eyes,  the  nose,  and 
the  forehead.  (In  the  case  when  the  class  of  objects  is 
not  well  defined,  the  issue  of  resolving  the  GBR  ambi¬ 
guity  becomes  more  subtle  and  is  essentially  an  open 
problem.)  A  surface  of  a  face  that  has  undergone  a 
GBR  transformation  will  have  different  distance  ratios 
and  can  be  asymmetric.  These  differences  allow  us  to 
estimate  the  three  parameters  of  the  GBR  transfor¬ 
mation  which  we  can  then  invert.  Note  that  this  in¬ 
verse  transformation  is  applied  to  both  the  estimated 
surface  z(x,y)  and  B.  Even  though  this  inverse  oper¬ 
ation  (which  is  also  a  GBR  transformation)  may  not 
completely  resolve  the  ambiguity  of  the  relief  because 
of  errors  in  the  estimation  of  the  GBR  parameters,  it 
nevertheless  comes  very  close  to  that  effect.  After  all, 
our  purpose  is  not  to  reconstruct  the  exact  Euclidean 
surface  of  the  face,  but  to  create  realistic  images  of  a 
face  under  differing  pose  and  illumination.  Moreover, 
since  shadows  are  preserved  under  GBR  transforma¬ 
tions  [4],  images  synthesized  under  an  arbitrary  light 
source  from  a  surface  whose  normal  field  has  been  GBR 
transformed  will  have  correct  shadowing.  This  means 
that  the  residual  GBR  transformation  (after  resolving 
the  ambiguity)  will  not  affect  the  image  synthesis  with 
variable  illumination. 

Figure  2  shows  the  reconstructed  surface  of  the  face 
shown  in  Figure  1  after  resolving  the  GBR  ambigu¬ 
ity.  The  first  basis  image  of  B*  shown  in  Figure  l.b 
has  been  texture-mapped  on  the  surface.  Even  though 
we  cannot  recover  the  exact  Euclidean  structure  of  the 
face  (i.e.  resolve  the  ambiguity  completely),  we  can 
still  generate  synthetic  images  of  a  face  under  variable 
pose  where  the  shape  distortions  due  to  the  residual 
GBR  ambiguity  are  quite  small  and  not  visually  de¬ 
tectable. 

4  Image  Synthesis 

We  first  demonstrate  the  ability  of  our  method  to  gen¬ 
erate  images  of  an  object  under  novel  illumination  con¬ 
ditions  but  fixed  pose.  Figure  3  shows  sample  single 
light  source  images  of  a  face  generated  with  the  im¬ 
age  formation  model  in  Equation  1  which  has  been 
extended  to  account  for  cast  shadows.  To  determine 
cast  shadows,  we  employ  ray-tracing  that  uses  the  re¬ 
constructed  surface  of  the  face  z(x,y)  after  resolving 
the  GBR  ambiguity.  Specifically,  a  point  on  the  surface 
is  in  cast  shadow  if,  for  a  given  light  source  direction, 
a  ray  emanating  from  that  point  parallel  to  the  light 
source  direction  intersects  the  surface  at  some  other 
point.  With  this  extended  image  formation  model, 
the  generated  images  exhibit  realistic  shading  and,  de¬ 
spite  the  small  presence  of  shadows  in  the  images  in 
Figure  l.a,  have  strong  attached  and  cast  shadows. 

Figure  4  displays  a  set  of  synthesized  images  of  the 


Figure  2:  The  reconstructed  surface. 

the  face  viewed  under  variable  pose  but  with  fixed 
lighting.  The  images  were  created  by  rigidly  rotat¬ 
ing  the  reconstructed  surface  shown  in  Figure  2  first 
about  the  horizontal  and  then  about  the  vertical  axis. 
Along  the  rows  from  left  to  right,  the  azimuth  varies 
(in  10  degree  intervals)  from  30  degrees  to  the  right  of 
the  face  to  10  degrees  to  the  left.  Down  the  columns, 
the  elevation  varies  (again  in  10  degree  intervals)  from 
20  degrees  above  the  horizon  to  30  degrees  below.  For 
example,  in  the  bottom  image  of  the  second  column 
from  the  left  the  surface  has  an  azimuth  of  20  degrees 
to  the  right  and  an  elevation  of  30  degrees  below  the 
horizon.  The  single  light  source  is  following  the  face 
around  as  it  changes  pose.  This  implies  that  a  patch 
on  the  surface  has  the  same  intensity  in  all  poses.  It  is 
interesting  to  see  that  the  images  look  quite  realistic 
with  maybe  the  exception  of  the  three  right  images  in 
the  bottom  row  which  appear  to  be  a  little  flattened. 
This  is  not  due  to  any  errors  during  the  geometric  or 
photometric  modeling  but  probably  due  to  our  visual 
priors;  we  are  not  used  to  looking  at  a  face  from  above. 

In  Figure  5,  we  combine  both  variations  in  viewing 
conditions  to  synthesize  images  of  the  face  under  novel 
pose  and  lighting.  We  used  the  same  poses  as  in  Fig¬ 
ure  4  but  now  the  light  from  the  single  point  source  is 
fixed  to  come  along  the  gaze  direction  of  the  face  in  the 
top-right  image.  Therefore,  as  the  face  moves  around 
and  its  gaze  direction  changes  with  respect  to  the  light 
source  direction,  the  shading  of  the  surface  changes 
and  both  attached  and  cast  shadows  are  formed,  as 
one  would  expect.  The  synthesized  images  seem  to 
agree  with  our  visual  intuition. 


Figure  3:  Sample  images  of  the  face  under  novel  illu¬ 
mination  conditions  but  fixed  pose. 

5  Discussion 

Appearance  variation  of  an  object  caused  by  small 
changes  in  illumination  under  fixed  pose  can  provide 
enough  information  to  estimate  (under  the  assumption 
of  a  Lambertian  reflectance  function)  the  object’s  sur¬ 
face  normal  field  scaled  by  its  albedo.  In  the  presented 
method,  as  few  as  three  images  with  no  knowledge  of 
the  light  source  directions  can  be  used  in  the  estima¬ 
tion.  The  estimated  surface  normal  field  can  then  be 
integrated  to  reconstruct  the  object’s  surface.  Unlike 
multi- view  based  image  synthesis,  our  approach  does 
not  require  the  determination  of  point  or  line  corre¬ 
spondences  to  do  the  surface  reconstruction.  Since  we 
are  dealing  with  a  well  known  class  of  objects,  we  can 
acceptably  resolve  the  GBR  ambiguity  of  the  recon¬ 
structed  surface.  Then,  the  surface  together  with  the 
surface  normal  field  scaled  by  the  albedo  are  sufficient 
for  synthesizing  images  of  the  object  under  novel  pose 
and  lighting. 

The  effectiveness  of  our  approach  stems  from  three 
reasons.  First,  the  estimation  of  the  illumination 
model  B*  does  not  use  any  invalid  data  (such  as  shad¬ 
ows)  which  would  otherwise  lead  to  large  biases.  Sec- 


ond,  the  integrability  constraint  is  enforced  on  the  sur¬ 
face  normal  field  which  significantly  improves  the  sur¬ 
face  reconstruction.  Last,  unlike  classical  photomet¬ 
ric  stereo,  our  method  requires  no  knowledge  of  light 
source  locations.  This  obviates  the  need  of  error-prone 
calibration  of  a  fixed  lighting  rig  where  any  errors  in 
estimating  the  position  of  the  light  sources  can  propa¬ 
gate  to  the  estimation  of  the  illumination  model  caus¬ 
ing  large  inaccuracies.  These  reasons  have  to  led  to 
improved  performance  and  we  have  demonstrated  this 
by  synthesizing  realistic  images  of  human  faces. 
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Figure  5:  Synthesized  images  under  both  variable  pose  and  lighting.  As  the  face  moves  around  the  single  light 
source  stays  fixed  resulting  to  image  variability  due  to  changes  in  pose  and  illumination  conditions. 


